Skip to content

Conversation

@sudhanshu112233shukla
Copy link

@sudhanshu112233shukla sudhanshu112233shukla commented Jan 22, 2026

#475 I have successfully implemented support for AGENTS.md in Ask Sourcebot. This feature allows repository maintainers to provide explicit, repository-specific instructions to the AI agent by simply adding an AGENTS.md file at the root of their repository. With this change, maintainers can directly influence how the AI behaves when answering questions about their codebase, without needing any external configuration or code changes.

The implementation works by modifying the chat agent’s initialization process so that these instructions are discovered and injected before any conversation begins. The entry point for this logic is the createAgentStream function in packages/web/src/features/chat/agent.ts. During initialization, the agent now iterates over every repository included in the user’s Search Scope. For each repository, it attempts to fetch a file named AGENTS.md using the existing getFileSource API. This logic runs asynchronously using Promise.all to ensure all repositories are checked efficiently. If the file exists, its contents are captured along with the repository name. If the file does not exist or an error occurs, the error is handled gracefully so that missing files do not break or block agent initialization.

Once all repositories have been checked, the collected AGENTS.md contents are formatted into a single structured string. Each entry clearly includes the repository name, the file name (AGENTS.md), and the full contents of the file. This ensures that instructions from multiple repositories can coexist and remain clearly attributable to their source.

I then updated the createBaseSystemPrompt function in the same file to accept this aggregated AGENTS.md content as an additional input. If any instructions are present, they are appended directly to the system prompt and wrapped inside a dedicated <repository_instructions> XML tag. The prompt text explicitly states that these instructions are verified and provided by the repository maintainers, and that the AI must follow them. This structure ensures the LLM interprets the instructions as high-priority system context rather than optional guidance or user input.

This approach guarantees that repository-defined rules are injected before the conversation starts and are consistently available throughout the agent’s reasoning process. By embedding them directly into the system prompt, the instructions take precedence over user prompts when applicable, which is exactly the intended behavior for maintainer-authored guidance.

To ensure the feature works end-to-end, I created a temporary unit test suite in agent.test.ts. These tests simulate the entire flow in isolation. The file system and API layer were mocked to simulate a repository containing an AGENTS.md file. The tests verify that createAgentStream correctly calls the file-fetching logic for each repository in the search scope and successfully retrieves the mocked AGENTS.md content. Additionally, the tests assert that createBaseSystemPrompt includes the fetched instructions in the final system prompt, wrapped exactly inside the <repository_instructions> XML block with the expected content.

All tests passed successfully, confirming that the fetch logic, prompt injection, and overall integration are working as intended. This implementation cleanly extends the existing architecture, requires no breaking changes, and provides a simple, repository-native way for maintainers to control how Ask Sourcebot’s AI agent interprets and responds to their codebase.

Summary by CodeRabbit

  • New Features
    • Chat agent now collects, sanitizes, and truncates repository AGENTS.md content and embeds it into system prompts when available, improving contextual awareness so responses better reflect repo-specific instructions and help the agent select appropriate tools and guidance during conversations.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

Walkthrough

Added retrieval and aggregation of repository AGENTS.md files, sanitized and truncated, then injected via a new agentsMdContent option into the base system prompt used by agent creation and streaming.

Changes

Cohort / File(s) Summary
Agent system prompt & stream
packages/web/src/features/chat/agent.ts
Added agentsMdContent?: string to BaseSystemPromptOptions; createBaseSystemPrompt now accepts and conditionally embeds agentsMdContent as a repository_instructions block. createAgentStream now fetches AGENTS.md from repositories (parallel), truncates/sanitizes/aggregates results into agentsMdContent, and passes it into the base prompt. Minor formatting alignment change.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Stream as createAgentStream
  participant Repos as RepoFetcher
  participant Prompt as createBaseSystemPrompt
  participant Agent as AgentRuntime

  Client->>Stream: initiate agent request
  Stream->>Repos: fetch AGENTS.md from repositories (parallel)
  Repos-->>Stream: AGENTS.md contents (truncated & sanitized)
  Stream->>Prompt: build base prompt (include agentsMdContent)
  Prompt-->>Stream: system prompt (with repository_instructions if present)
  Stream->>Agent: initialize agent with system prompt + tools
  Agent-->>Client: stream responses
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: support AGENTS.md in Ask Sourcebot' clearly and concisely summarizes the main change: adding support for AGENTS.md files to the AI agent.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/web/src/features/chat/agent.ts`:
- Around line 205-209: The template injects raw AGENTS.md into the system prompt
via agentsMdContent, creating a prompt-injection risk; update the code that
builds the prompt (the template using agentsMdContent) to (1) impose a
configurable max length on agentsMdContent and truncate with ellipses, (2)
sanitize/escape dangerous patterns (strip or escape XML/HTML tags, remove or
neutralize strings that resemble "system" or instruction overrides, and remove
sequences like "</system>" or other closing tags), and (3) add a comment/log
entry and configuration flag documenting that enabling AGENTS.md is a trust
decision; locate and change the prompt construction where agentsMdContent is
interpolated and apply these transformations before injection.
🧹 Nitpick comments (1)
packages/web/src/features/chat/agent.ts (1)

41-54: Consider adding a constant for the AGENTS.md filename and logging for observability.

The hardcoded 'AGENTS.md' string could be extracted to a constant in constants.ts for consistency and easier maintenance. Additionally, consider logging when AGENTS.md files are successfully fetched for debugging purposes.

♻️ Suggested improvement

In constants.ts:

export const AGENTS_MD_FILENAME = 'AGENTS.md';

Then in this file:

     const agentsMdResults = await Promise.all(searchScopeRepoNames.map(async (repo) => {
         const result = await getFileSource({
-            fileName: 'AGENTS.md',
+            fileName: AGENTS_MD_FILENAME,
             repository: repo,
         });

         if (isServiceError(result)) {
             return null;
         }
+
+        logger.debug(`Fetched AGENTS.md from ${repo}`);
         return `Repository: ${repo}\nFile: AGENTS.md\nContent:\n${result.source}`;
     }));

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/web/src/features/chat/agent.ts`:
- Around line 59-60: The current sanitization in agent.ts only replaces the
specific string '</repository_instructions>' on the variable content, leaving
other XML-style closing tags (e.g., </workflow>, </available_repositories>,
</research_phase_instructions>, </answer_instructions>) vulnerable to prompt
injection; update the handling of content (the same variable and its .replace
usage) to either escape or remove all closing XML-like tags by applying a
broader transformation (e.g., a single regex that matches any closing tag like
</...> and replaces or escapes it, or strip angle brackets entirely) so every
closing XML-style tag is covered.
🧹 Nitpick comments (1)
packages/web/src/features/chat/agent.ts (1)

216-220: Consider rewording "verified" — content is fetched, not authenticated.

The phrase "verified instructions from the repository maintainers" implies authentication or validation of authorship. In practice, AGENTS.md is simply fetched from the repository without cryptographic verification of who authored it. Anyone with write access to the repo could modify it.

Consider softening the language to accurately reflect the trust model:

📝 Suggested wording
 ${agentsMdContent ? `<repository_instructions>
-The following are verified instructions from the repository maintainers (AGENTS.md). You MUST follow these instructions when generating code or answering questions for these repositories.
+The following instructions are from repository AGENTS.md files. Follow these guidelines when generating code or answering questions for these repositories.

 ${agentsMdContent}
 </repository_instructions>` : ''}

@sudhanshu112233shukla
Copy link
Author

@brendan-kellam @msukkari do check it !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant